Goto

Collaborating Authors

 multi-agent q-learning




SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Neural Information Processing Systems

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE.


Multi-agent Reinforcement Learning Paper Reading QPLEX

#artificialintelligence

In the previous article, I shared the paper(you can follow the link below to recap!!!): Weighted QMIX: Expanding Monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning, which argues that the previous approaches, such as VDN and QMIX, can only factorize a little group of tasks, and proposed a new framework to overcome the issue. In this article, I gonna share another way to factorize any factorizable task, which is called QPLEX!!! In most of the multi-agent approaches, we tend to explore the popular paradigm of centralized training with decentralized execution(CTDE). In this paradigm, individual-Global-Max(IGM) principle plays an important role. However, lots of the methods tend to relax the IGM consistency so that they can achieve scalability.


Untangling Braids with Multi-agent Q-Learning

Khan, Abdullah, Vernitski, Alexei, Lisitsa, Alexei

arXiv.org Artificial Intelligence

We use reinforcement learning to tackle the problem of untangling braids. We experiment with braids with 2 and 3 strands. Two competing players learn to tangle and untangle a braid. We interface the braid untangling problem with the OpenAI Gym environment, a widely used way of connecting agents to reinforcement learning problems. The results provide evidence that the more we train the system, the better the untangling player gets at untangling braids. At the same time, our tangling player produces good examples of tangled braids.